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This study uses Partial Credit Rasch analysis to study a complex data set of 
student responses to survey items relating to chance and data. The items 
were administered in the classroom and collected from 1993 to 2003 in the 
Australian state of Tasmania. Data were collected from a total of 5514 indi- 
vidual students across Grades 3 to 11 over the decade and of these students 
896 provided at least one repeated measure. As students completed a core of 
common items, Rasch analysis could be performed and all students were 
subsequently placed on the same logit scale for comparison. The purpose of 
the analysis is to consider average cohort change over time and trends in 
performance during the first 10 years after the curriculum was introduced in 
Tasmania. Implications for the education system and curriculum implemen- 
tation are considered. 


The topics Chance and Data were officially introduced into the 
mathematics curriculum in the Australian state of Tasmania in 1993 in the 
Mathematics Guidelines K-8 (Department of Education and the Arts, 1993). This 
followed their introduction in the United States by the National Council of 
Teachers of Mathematics (NCTM, 1989) and in Australia in A National 
Statement on Mathematics for Australian Schools (Australian Education Council 
[AEC], 1991). From 1993 a series of projects followed cohorts of Tasmanian 
school students over the next decade, identifying understanding, and using 
surveys that were classroom-administered tests, which students were told 
were not associated with their school assessment but for research only. The 
purposes of the research included constructing models for the development 
of understanding of chance and data concepts. The survey data collected 
throughout the projects also allowed the comparison of cohorts across 
the first decade following the implementation of the Chance and Data 
curriculum in Tasmania. The comparison gave an indication of the success of 
the curriculum in increasing student understanding of the topics. 


Before the introduction of the NCTM's Standards in 1989 there was little 
research into school students' understanding of statistical concepts. Green 
(1983, 1986), Fischbein (1975), and Piaget and Inhelder (1951/1975) were the 
major contributors in the area of probability and Goodchild (1988), Mevarech 
(1983), and Strauss and Bichler (1988) on the topic of average. The structure of 
the new curriculum (following, e.g.. Holmes, 1980), however, was more 
broadly based, focussing on all aspects of a statistical investigation: data 
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collection and sampling, data representation, data reduction, probability, 
and analysis and inference. The initial research in Tasmania based on a 
representative sample of schools hence set out to cover all components of the 
curriculum (Watson, 1994). Reports on each of the five aspects identified in the 
new curriculum have appeared in recent years from the Tasmanian studies 
and others around the world (e.g., Cai, 1998; Friel, Curcio, & Bright, 2001; 
Jacobs, 1999; Lehrer & Romberg, 1996; Shaughnessy, 2003; Watson, 2006). 

As well as an emphasis on the statistical investigation, from 1990 
statisticians and statistics educators stressed the importance of variation as 
the phenomenon underlying statistical investigations (Moore, 1990; Wild & 
Pfarmkuch, 1999) and the need for educational research in this area (Green, 
1993; Shaughnessy, 1997). This prompted a project in Tasmania in a group of 
schools that had not been involved in earlier studies that not only explored 
student understanding of variation (e.g., Kelly & Watson, 2002; Watson & 
Kelly, 2004a, 2005) but also provided instruction in Chance and Data that 
emphasised variation (e.g., Watson & Kelly, 2002). The surveys in the 
variation project included items from earlier surveys as well as items dealing 
explicitly with variation as it occurs at various stages of a statistical 
investigation. Watson, Kelly, and Izard (2004) reported on the survey 
outcomes of this variation project with overall results indicating that, 
although eight weeks after instruction mean scores improved, after two years 
there was little difference between the average outcomes for students in the 
schools that had experienced the special lessons and the schools that had 
experienced the usual curriculum with no intervention from the project. 

During this decade (1993 to 2003) wider issues of statistical literacy were 
canvassed in the education community, both for school students and adults 
(Gal, 2002; Wallman, 1993; Watson, 1997). Data from 1993, 1995, 1997, 
and 2000 were used by Watson and Callingham (2003) to suggest a 
developmental pathway of understanding of statistical literacy. Using Rasch 
techniques they described six levels of increasing facility with statistical 
ideas. At the Idiosyncratic Level (Level 1), students can read cells in tables 
and carry out one-to-one counting tasks but are likely to produce tautologies 
or other responses unrelated to the tasks presented. At the Informal Level 
(Level 2), they carry out one-step calculations but generally express intuitive 
beliefs for example about probability. At the Inconsistent Level (Level 3), 
students are likely to give qualitative rather than quantitative responses to 
tasks and show a limited appreciation of content and context. At Level 4, 
Consistent Non-Critical, they show straightforward engagement with context 
and generally deal with simple means, probabilities and graphs. At Level 5, 
Critical, students appear to appreciate the part variation plays in most 
contexts and are able to question claims that do not require mathematical 
skills, particularly proportional reasoning. Finally at the Critical 
Mathematical Level (Level 6), students engage in critical questioning of tasks, 
employing proportional reasoning and nuanced language in responses. 

In 2003 it was possible to survey students in the schools that had 
participated in the original project a decade earlier. A survey was devised 
comprised of items across the Chance and Data curriculum from the earlier 
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Tasmanian studies and including items addressing consideration of the 
underlying concept of variation from the variation study. This made it 
possible to compare across all years in which surveys were conducted. 
Initially comparisons were made only for schools in which original and 
longitudinal data were collected and the hierarchy of statistical literacy 
understanding suggested by Watson and Callingham (2003) was confirmed 
(Watson, Kelly, & Izard, 2005). 

The use of partial credit Rasch analysis (Masters, 1982) allows the 
comparison of all cohorts over the decade due to the use of common items 
linking across the surveys. This type of analysis has not occurred previously 
in relation to understanding of Chance and Data and has the potential to 
identify concepts that are stable or unstable over time. The data collected 
hence allow consideration of the following research questions. 

1. Considering all data collected, what is the trend in student 
performance over the first decade after the introduction of chance 
and data into the mathematics curriculum in the Australian state of 
Tasmania? 

2. Where longitudinal data are available, how do average student 
understandings change over two-, four-, or six-year periods? 

Details on the grade levels for which these questions are addressed are given 
in the Methodology. 

Methodology 

Sample 

A total of 5514 students over the decade completed surveys that are 
analysed in this paper. Of these 896 completed the surveys twice and 264 
completed them three times. A summary of the grades and years in which 
surveys were completed and the sample sizes is given in Table 1. A total of 
6410 surveys was completed in 28 separate samples. 

Table 1 


Sample Sizes for All Student Responses by Year and Grade (Numbers in Parenthesis 
Indicate Students Surveyed Two or Three Times) 


Grade 

1993 

1995 

1997 

2000 

2002 

2003 

Total 

3 

322 (147 a ) 

303 

237 (54 b ) 

176 (114 b ) 


189 

1227 

5 


465 (147 a ) 

226 

183 (102 b ) 

114 b 


988 

6 

311 (117 a ) 

337 

233 



174 

1055 

7 



314 (147 a ) 

186 (135 b ) 

102 b 


602 

8 


374 (11 7 a ) 

192 




566 

9 

392 (117 b ) 

371 (51 b ) 

105 

193 (59 b ) 

135 b 

251 (54 b ) 

1447 

10 



297 (11 7 a ) 




297 

11 


118 (11 7 b ) 

51 b 


59 b 


228 

Total 

1025 

1968 

1655 

738 

410 

614 

6410 


Note. a Students surveyed three times. b Students surveyed twice. 
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Thirteen state government schools representing all regions of the state of 
Tasmania participated in the surveys in 1993, 1995, 1997, and 2003. Ten new 
schools from the Hobart suburban area participated in the surveys in 2000 and 
2002. These were chosen because they were similar to each other to provide 
comparison data about the effectiveness of a teaching intervention. In the 
current study, the data for 2000 and 2002 were combined for schools in which 
intervention took place and those where it did not. This was because the 
schools had been matched initially on socio-economic criteria but after two 
years there were no differences in average longitudinal performance (Watson 
et al., 2004). The data from Grade 11 surveys were collected from the senior 
secondary schools for which high schools in the project were feeder schools. 
As senior secondary schooling is not compulsory in Tasmania, there was a 
considerable drop in participation for follow-up surveys at this grade level. 

Instruments 

The instruments used in 1993, 1995, 1997, 2000, and 2002 are included in 
an appendix in Watson and Callingham (2003). Those specific to 2000 and 
2002 are presented in Watson, Kelly, Callingham, and Shaughnessy (2003). 
The selection of items for the 2003 survey was made from the earlier ones, 
with the addition of five new items developed for a parallel survey used in a 
different school system (Callingham & Watson, 2005; Watson & Callingham, 
2005). The items employed over the years 1993-2003 covered the components 
of the Chance and Data curriculum as found in curriculum documents (AEC, 
1991; NCTM, 1989, 2000) as well as specific aspects of variation. In all years 
where surveys took place, students in lower grades answered fewer 
questions than those in higher grades. Although some items were coded on a 
right-wrong (1-0) basis, most were coded in a hierarchical fashion based on 
structure and appropriateness, with codes ranging from 0-2 to 0-5. Details of 
coding are found in the sources noted in this paragraph and were based on 
the SOLO Taxonomy of Biggs and Collis (1982) or the statistical literacy 
hierarchy of Watson (1997). 

Analysis 

The data were analysed using Rasch (1960/1980) measurement 
techniques. The initial analyses of data collected in 2000 established anchor 
values for the items in common across years 1993 to 2003, so tests including 
these items could be calibrated on a common scale or continuum of 
achievement. The year 2000 was chosen because that survey contained both 
general items and variation items and hence was an appropriate data set for 
producing anchor values. 

Using anchor values from the year 2000 data, the year 2003 data were 
analysed and a second, more comprehensive anchor file constructed. These 
anchored item values were then used in a series of analyses in which common 
items from each study year, 1993, 1995 and 1997, were added into the data 
pool, to create an anchor file consisting of 31 items that met the criteria for fit. 
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Items that misfitted at any stage were dropped from the anchor file to ensure 
that the final anchor file was robust. The final statistics associated with the 
output of this Rasch analysis were acceptable and are reported in Appendix 
A. This file was subsequently used to estimate all other item difficulties and 
to obtain ability estimates for all students in each year. 

The scaled achievement scores for each student on each test could then be 
used to determine the initial differences among the grades and the changes 
over the time between testings. The effect sizes for these differences were 
determined using Cohen's (1969) methodology and reported with descriptors 
devised by Cohen (1969) and Izard (2004). The scaled scores of all students 
were then used in subsequent analyses. For each grade for which data were 
collected during the decade, the trend in mean scores is presented. Appendix 
B contains the means and standard deviations for each of the 28 samples 
detailed in Table 1 as well as the comparisons across the years for each grade 
except Grade 10, which was sampled only once. The means are displayed 
graphically in Figure 1 to indicate the trend over grades and over the decade 
in relation to Research Question 1. To reinforce the suggestion of trends in 
growth in understanding over time for individual cohorts, the 11 samples 
(with numbers in parenthesis in Table 1) for which longitudinal data were 
collected on individual students are considered in relation to Research 
Question 2. Appendix C contains the means, standard deviations, and effect 
sizes for cohort comparisons among these 11 pairs of data sets. 

Results 


Research Question 1 

Figure 1 displays the mean logit scores for each of the 28 samples in the 
study. They are displayed by increasing grade and within each grade by 
increasing year. An overall trend for increasing mean with grade is evident. 
For each grade where longitudinal data were collected, 2003 has the lowest 
mean score. For Grade 9 data are available for every year covered by the 
studies, and here 2003 had the lowest mean. The same is true for 2003 in the 
Grade 3 and Grade 6 data. The comparisons of pairs of years within grades 
are presented in Appendix B, where large differences, using Cohen's (1969) 
criteria, are observed for Grade 3 in a negative direction between 2003 and 
both 1997 and 2000, for Grade 5 a positive difference for both 2000 and 2002 
in relation to 1995 and 1997, and a negative difference for Grade 9 between 
2003 and 1993. 

The improvement in performance from both 1993 and 1995 to 1997 and 
2000 was medium in Cohen's terms for successive Grade 3 groups, as was the 
decline between those years and 2003. The decline in performance for 
successive Grade 6 groups for all three years 1993, 1995, and 1997 to 2003 was 
also medium, as was that for Grade 9 from 1993 to 1995, 1997, and 2000. 
Between 2000 and 2002 for this grade there was a medium improvement but 
then a drop in mean performance of this degree, again from 2002 to 2003. 
Except for the apparent decline in 2003, there appears to be little overall trend 
that would indicate steady improvement or decline across the decade. 
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Figure 1. Mean scores for the 28 samples ordered by year within grade level. 

Research Question 2 

The collection of longitudinal data for some years allows the observation 
of cohort change over time. Given the observations for Research Question 1 it 
is not possible to suggest a direct curriculum implementation effect but to 



Key: □ Grade 3/5/7 [1993, 1995, 1997] 
A Grade 6/8/10 [1993. 1995. 1997] 
O Grade 9/11 [1993, 1995] 
k Grade 9/11 [1995, 1997] 

O Grade 3/9 [1997,2003] 


O Grade 3/5 [2000, 2002] 
A Grade 5/7 [2000. 2002] 
□ Grade 7/9 [2000, 2002] 
O Grade 9/11 [2000,2002] 


Figure 2. Relative performance for longitudinal data sets. 
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reinforce observations of trends of improvement with respect to maturity and 
general learning within school and society. Appendix C presents the means, 
standard deviations and comparisons for these 11 data sets and they are shown 
graphically in Figure 2. Lines connect pairs of data sets involving the same 
students. Error bars for the means have been omitted for clarity in reading 
the graph. 

Considering the three data sets where two successive comparisons were 
possible for the same group of students, for students originally in Grade 3, 
both differences (Grade 3 to Grade 5, 1993 to 1995, and Grade 5 to Grade 7, 
1995 to 1997) were large using Cohen's criteria, whereas for students 
originally in Grade 6, the 1993-1995 (Grade 6 to Grade 8) difference was 
medium and the 1995-1997 (Grade 8 to Grade 10) difference was large. In all 
cases there was improvement, from Grade 3 to Grade 5 to Grade 7, and from 
Grade 6 to Grade 8 to Grade 10. The other large difference, as might be 
expected, was from Grade 3 in 1997 to Grade 9 in 2003, whereas there was 
only a small difference for Grade 9 to Grade 11 in both 1993 to 1995, and 1995 
to 1997. For the 2000 to 2002 data, the difference from Grade 3 to Grade 5 was 
positive and large, whereas from Grade 5 to Grade 7 it was negative and 
small. For Grade 7 to Grade 9 it was again positive and medium, whereas for 
Grade 9 to Grade 11 it was positive and small as for the earlier years. 

Overall the longitudinal data suggest that the greatest improvement in 
understanding occurred between Grade 3 and Grade 5, whereas after a 
questionable period in Grades 6 and 7, again at least medium degrees of 
improvement occurred. Although some improvement continued to Grade 11, 
generally it was small compared to earlier two-year periods. 

Discussion 

Two aspects of this research are considered in the discussion. The first relates 
to the use of Rasch analysis, and the second to the observation of performance 
over the first decade after the introduction of the Chance and Data curriculum. 

Over a decade of research, evolution in thinking occurs and researchers are 
influenced by other happenings in the field. So it was with this research. The 
original tests used with students reflected the goals of the curriculum in Chance 
and Data (AEC, 1991; NCTM, 1989), as well as including items used by earlier 
researchers (e.g., Fischbein & Gazit, 1984) and indicating the emerging interest 
in statistical literacy (Watson, 1997). By the end of the 1990s, however, the 
acknowledged importance of statistical variation meant that further items were 
added to tests from 2000, as well as retaining some from previous studies. The 
use of Rasch analysis allowed the data sets to be combined using common 
items in order to be able to put all students over the decade and all items on the 
same scale. The initial analysis of items by Watson and Callingham (2003; 2005) 
and Callingham and Watson (2005) indicated that the original concepts and 
the items focusing on variation within the original contexts produce a 
uni- dimensional construct of statistical understanding. The final step in this 
construction of instruments means that it is now possible to measure 
confidently students' understanding across the topics associated with statistics. 
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The other advantage to using Rasch analysis is related to the use of fewer 
survey items with younger students. In carrying out comparison studies by 
conventional methods using raw score totals, as was undertaken, for 
example, by Watson and Kelly (2004b), it is necessary to delete some items if 
some grades have not completed them or to do comparisons only between 
certain grades. It may be in such comparisons that some information is lost. 
Using Rasch techniques allows all students to be placed on the same scale and 
hence an overall picture of performance to emerge. 

In terms of following the progress of students in Tasmania over the 
decade since the introduction of the Chance and Data curriculum the picture 
is mixed. Except for one group followed longitudinally from Grade 5 to 
Grade 7, other students' performance improved to a greater or lesser extent 
over the two, four, or six years they were followed. The middle school plateau 
has been observed by others (e.g., Callingham & McIntosh, 2002) and is a 
matter of concern within a context where general development appears to 
occur within all other groups. 

In terms of growth for particular grades over the years, the primary 
Grades 3 and 5 show the greatest improvement over the initial years of the 
curriculum until 2000. Grade 9 showed mixed results over the ten years. 
Although it may not be realistic to expect large differences in successive 
years, over the decade in the original schools surveyed in 1993, the 2003 
results are disappointing for Grades 3, 6, and 9. 

There are some possible reasons that could be suggested including lack 
of continuing professional development for teachers over the decade. 
Certainly at the time of initial introduction there were curriculum 
implementation officers working in all districts in the state. These positions 
were discontinued in 1997 and later replaced with Literacy and Numeracy 
officers, whose brief was much wider than specific curriculum 
implementation. As well, 2000 was the beginning of the introduction of the 
Essential Learnings Framework (Department of Education, 2002) with a focus 
on values-based education including 18 essential elements, of which "Being 
Numerate" was one. Mathematics as a discipline did not feature in this 
framework. Emphasis on basic numeracy as part of the Essential Learnings 
may have reduced the focus on particular aspects of the mathematics 
curriculum, such as Chance and Data. No other data-driven evidence apart 
from this study could be found to explain the decline in performance in 2003. 
It would be of interest to know if a similar trend occurred in other subject 
areas but no such longitudinal data are known to exist. 

Anecdotal evidence from a senior teacher in one of the schools involved 
from 1993 in the research — a school that had also been involved in a project 
on the theme "Thinking Mathematically" for five years — suggested that 10 
years was not enough to see a change in Chance and Data. Other topics were 
more important and it would take 20 years for an improvement to be seen in 
Chance and Data, both for teachers and students. 

From an analysis perspective this study illustrates the usefulness of 
Rasch analysis for placing students from different grades and different times 
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on the same scale in order to make the comparisons and for following the 
development of individual students. From an educational point of view the 
outcomes point to some concerns about achieving the goals of the curriculum 
implementation and about the plateau in performance for some cohorts 
across the middle grades. 
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Appendix A: Rasch Statistics 


Summary Results: 2003 data anchored on 2000 results 


Items lto37 2003 data (Run No 8) 

Item Estimates (Thresholds) 

all on all 

Case Estimates all on all 


(N = 614 L = 31 Probability 

Level=0 . 50 ) 

(N = 614 L = 31 Probability 

Level=0 . 50 ) 

Summary of item Estimates 


Summary of case Estimates 


Mean 

-0.41 

Mean 

-0.42 

SD 

1.29 

SD 

0.89 

SD (adjusted) 

1.29 

SD (adjusted) 

0.83 

Reliability of estimate 

1.00 

Reliability of estimate 

0.87 

Fit Statistics 


Fit Statistics 


Infit Mean Square Outfit Mean Square 

Infit Mean Square Outfit Mean Square 

Mean 0.91 

Mean 0.92 

Mean 0 . 94 

Mean 0 . 93 

SD 0.14 

SD 0.21 

SD 0.37 

SD 0.58 

Infit t 

Outfit t 

Infit t 

Outfit t 

Mean -1.30 

Mean -0.74 

Mean -1.17 

Mean -0.09 

SD 1.79 

SD 1.81 

SD 1.03 

SD 0.77 

0 items with zero scores 


0 case with zero scores 


0 items with perfect scores 

0 case with perfect scores 


Summary Results: 1993-1997 data anchored on 2000 and 2003 data 


JW Items 1-40 Grades 3-10 1993+ Initial Test 

only (Run 6) 


Item Estimates (Thresholds) 

all on 

all 

Case Estimates all on all 


(N = 902 L = 36 Probability 

Level=0 

.50) 

(N = 902 L = 36 Probability 

Level=0 . 50 ) 

Summary of item Estimates 



Summary of case Estimates 


Mean 

0.06 


Mean 

0.47 

SD 

1.23 


SD 

0.63 

SD (adjusted) 

1.22 


SD (adjusted) 

0.57 

Reliability of estimate 

1.00 


Reliability of estimate 

0.83 

Fit Statistics 



Fit Statistics 


Infit Mean Square Outfit Mean 

Square 

Infit Mean Square Outfit Mean Square 

Mean 0.89 

Mean 

0.88 

Mean 0 . 95 

Mean 0.88 

SD 0.22 

SD 

0.24 

SD 0.30 

SD 0.35 

Infit t 

Outfit t 

Infit t 

Outfit t 

Mean -1.77 

Mean 

-1.27 

Mean -0.25 

Mean -0.18 

SD 3.71 

SD 

2.69 

SD 1.16 

SD 0.66 

0 items with zero scores 



0 case with zero scores 


0 items with perfect scores 


0 case with perfect scores 
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Appendix B: Sample Size, Means and Standard Deviations 

of all 28 samples 


Grade 

1993 

1995 

1997 

2000 

2002 

2003 

3 

n = 322 
M = -0.78 
SD =0.67 

n = 303 
M = -0.86 
SD = 0.61 

n = 237 
M = -0.51 
SD = 0.76 

n = 176 
M = -0.37 
SD = 0.75 


n = 189 
M = -1.25 
SD = 0.74 

5 


n = 465 
M = -0.30 
SD = 0.53 

n = 226 
M = -0.23 
SD = 0.50 

n = 183 
M = 0.37 
SD = 0.53 

n = 114 
M = 0.37 
SD = 0.47 


6 

n = 311 
M = -0.14 
SD = 0.52 

n = 337 
M = -0.15 
SD = 0.55 

n = 234 
M = -0.13 
SD = 0.60 



n = 174 
M = -0.47 
SD = 0.62 

7 



n = 314 
M = 0.06 
SD = 0.54 

n = 186 
M = 0.07 
SD = 0.79 

n = 102 
M = 0.24 
SD = 0.63 


8 


n = 374 
M = 0.21 
SD = 0.58 

n = 192 
M = 0.13 
SD = 0.58 




9 

n = 392 
M = 0.64 
SD = 0.61 

n = 371 
M = 0.32 
SD = 0.69 

n = 105 
M = 0.35 
SD = 0.53 

n = 193 
M = 0.24 
SD = 0.69 

n = 135 
M = 0.59 
SD = 0.79 

n = 251 
M = 0.18 
SD = 0.58 

10 



n = 297 
M = 0.67 
SD = 0.64 




11 


n = 118 
M = 0.96 
SD = 0.65 

n = 51 
M = 0.86 
SD = 0.80 


n = 59 
M = 0.71 
SD = 0.68 
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Appendix C 


Comparisons of Mean Scores Across the Years for Each Grade (No adjustment for 
Design Effect on line 3; Effect size on line 4) 


Grade 3 

1995 

1997 

2000 

2002 

2003 

1993 

-0.78, -0.86 

-0.78, -0.51 

-0.78, -0.37 


-0.78, -1.25 


71=322, 303 

71=322, 237 

71=322, 176 


71=322, 189 


f=-1.59, NS 

£=4.48, p<.0001 

£=6.19, p<.0001 


£=-7.30, pc.OOOl 


-0.12 (V. Small) 

0.38 (Medium) 

0.59 (Medium) 


-0.67 (Medium) 

1995 


-0.86. -0.51 

-0.86, -0.37 


-0.86, -1.25 



71=303, 237 

71=303, 176 


n=303, 189 



£=5.99, p <. 0001 

£=7.74, p<.0001 


£=-6.26, pc.OOOl 



0.51 (Medium) 

0.74 (Medium) 


-0.59 (Medium) 

1997 



-0.51, -0.37 


-0.51, -1.25 




71=237, 176 


71=237, 189 




1=1.76, NS 


£=-10.02, p <. 0001 

2000 



0.19 (Small) 


-0.99 (Large) 
-0.37, -1.25 
71=176, 189 






£=-11.14, pc.OOOl 
-1.18 (Large) 

Grade 5 

1995 

1997 

2000 

2002 

2003 

1995 


-0.30, -0.23 

-0.30, 0.37 

-0.30, 0.37 




n=465, 226 

71=465, 183 

n=465, 123 




1=1.72, NS 

£=14.38, p<.0001 

£=12.32, p <. 0001 


1997 


0.13 (V. Small) 

1.26 (Large) 
-0.23, 0.37 

1.29 (Large) 
-0.23, 0.37 





71=226, 183 

n=226, 123 





£=11.63, pc.OOOl 

£=10.63, p<.0001 


2000 



1.17 (Large) 

1.23 (Large) 
0.37, 0.37 
n=183, 123 
£=0.02, NS 
0.00 


Grade 6 

1995 

1997 

2000 

2002 

2003 

1993 

-0.14, -0.15 

-0.14, -0.13 



-0.14, -0.47 


71=311, 337 

71=311, 233 



71=311, 174 


f=-0.28, NS 

1=0.23, NS 



£=-6.29, p <. 0001 


-0.02 (V. Small) 

0.02 (V. Small) 



-0.59 (Medium) 

1995 


-0.15, -0.13 



-0.15, -0.47 



n=337, 233 



71=337, 174 



1=0.47, NS 



£=-5.97, pc.OOOl 



0.04 (V. Small) 



-0.56 (Medium) 

1997 





-0.13, -0.47 
n=233, 174 
£=-5.63, pc.OOOl 
-0.56 (Medium) 

Grade 7 

1995 

1997 

2000 

2002 

2003 

1997 



0.06, 0.07 

0.06, 0.24 





71=314, 186 

n=314, 102 





1=0.15, NS 

£=2.66, p<.004 





0.02 (V. Small) 

0.23 (Small) 


2000 




0.07, 0.24 
71=186, 102 
£=1.79, p<.04 
0.32 (Small) 


Grade 8 

1995 

1997 

2000 

2002 

2003 

1995 


0.21. 0.13 
n=374, 192 
£=-1.57, NS 
-0.14 (V. Small) 
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Grade 9 

1995 

1997 

2000 

2002 

2003 

1993 

0.64, 0.32 

0.64, 0.35 

0.64, 0.24 

-0.64, 0.59 

0.64, 0.18 


n= 392, 371 

n= 392, 105 

n= 392, 193 

ft =392, 135 

ft =392, 251 


f=-6.87, pc.OOOl 

f=- 4.54, pc.OOOl 

f=-7.18, pc.OOOl 

(=-0.88, NS 

f=-9.72, p<.0001 


-0.49 (Medium) 

-0.49 (Medium) 

-0.63 (Medium) 

-0.08 (V. Small) 

-0.77 (Large) 

1995 


0.32, 0.35 

0.32, 0.24 

0.32, 0.59 

0.32, 0.18 



n=371, 105 

w=371, 193 

m=371, 135 

ft =371, 251 



f=0.39, NS 

f=-1.25, NS 

£=3.68, pc.OOOl 

t=-2.71, p<.007 



0.05 (V. Small) 

-0.12 (V. Small) 

0.38 (Small) 

-0.22 (Small) 

1997 



0.35, 0.24 

0.35, 0.59 

0.35, 0.18 




n=105, 193 

ft=105, 135 

ft=105, 251 




f=-1.36, NS 

£= 2.66, p <. 005 

t=- 2.62, p<.01 




-0.17 (V. Small) 

0.35 (Small) 

-0.30 (Small) 

2000 




0.24, 0.59 

0.24, 0.18 





«=193, 135 

ft=193, 251 





f=4.18, ^<.0001 

(=-1.11. NS 





0.48 (Medium) 

-0.10 (V. SmaU) 

2002 





0.59, 0.18 
ft=135, 251 






f=-5.84, p<.0001 
-0.62 (Medium) 

Grade 11 

1995 

1997 

2000 

2002 

2003 

1995 


0.96, 0.86 


0.96, 0.71 




n=118, 51 


ft=118, 59 




f=-0.90, NS 


f=-2.46, p <. 008 




-0.14 (V. Small) 


-0.38 (Small) 


1997 




0.86, 0.71 
ft=51, 59 
f=-1.08, NS 
-0.20 (Small) 
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Appendix D: Longitudinal Comparisons 


Comparisons Across Years for Each Grade in the Longitudinal Samples (No adjust- 
ment for Design Effect on line 3; Effect size on line 4) 


G 3, 5, 7 

1995 

1997 

2002 

2003 

1993 

1995 

-0.80, -0.25 
?i=147 

f= 11.98, pc.OOOl 
0.95 (Large) 

-0.25, 0.14 
11=147 

f=9.30, pc.OOOl 
0.81 (Large) 



G 6, 8, 10 

1995 

1997 

2002 

2003 

1993 

1995 

-0.10, 0.25 
n= 117 

f=8.67, pc.OOOl 
0.65 (Medium) 

0.25, 0.73 
n=117 

f=10.60, pc.OOOl 
0.80 (Large) 


G 9, 11 

1995 

1997 

2002 

2003 

1993 

0.84, 0.97 
ji=117 

f=2.40, pc.009 
0.20 (Small) 




G 9, 11 

1995 

1997 

2002 

2003 

1995 


0.67, 0.86 
n=51 

f=2.23, p<. 02 
0.26 (Small) 



G3,9 

1995 

1997 

2002 

2003 

1997 




-0.28, 0.28 
n=54 

f=5.52, pc.OOOl 
0.79 (Large) 

G3,5 

1995 

1997 

2002 

2003 

2000 



-0.35, 0.37 
ii=114 

f=12.26, pc.OOOl 
1.11 (Large) 

G 5, 7 

1995 

1997 

2002 

2003 

2000 



0.34, 0.24 
n=102 

f=-2.17, pc.02 
-0.18 (Small) 


G 7, 9 

1995 

1997 

2002 

2003 

2000 



0.13, 0.61 
n=135 

t=8.65, pc.OOOl 
0.67 (Medium) 


G 9, 11 

1995 

1997 

2002 

2003 

2000 



0.51, 0.71 
n=59 

t=2.78, pc.003 
0.32 (Small) 



